Skip to content
This repository was archived by the owner on Oct 29, 2024. It is now read-only.

[Fix](#649) changes to support accurate ns timepoints #650

Closed

Conversation

auphofBSF
Copy link

as documented in (#649).
use floor divided on integer division and use pandas.Timestamp for ns timestamp resolution

Extensively tested but not fully tested, unit test updates to follow
I believe address and fixes ( #527,#489, #346, #344, #340)

extensive but not fully tested, unittests updates to follow
also fixes ( influxdata#527,influxdata#489, influxdata#346, influxdata#344, influxdata#340)
@aviau
Copy link
Collaborator

aviau commented Oct 14, 2018

I'd be happy to merge if it includes unit tests.

@auphofBSF
Copy link
Author

auphofBSF commented Oct 14, 2018

Have added fixes to enable passing existing python unit tests.

However some failures persist

  • Unit Test failures are with pypy because of the now mandatory dependency on pandas to do ns precision timestamps. I have no experience with pypy can we allow dependency on pandas for pypy ?

  • The other unit test failure is flake8, I cannot figure this one out HELP please someone
    FIXED e8f2408

UNITTESTS

  • I am testing further unit test updates and additions for confirmation of ns precision conformance.
    RELEASED b195e38

@xginn8 xginn8 requested review from aviau, sebito91 and xginn8 as code owners April 7, 2019 14:44
@tpietruszka
Copy link

While I believe this PR improves the situation, I still see a problem.

When trying to write points with microsecond precision, the timestamps are sometimes erroneous (1 us too early)

Why? The precision_factor values (other than 1), in _dataframe_client.py, are floats, and so the result of floor division is also a float. For microsecond precision this poses a problem - most timestamps only have approximate float representations. If the approximation is smaller than the actual value, the next call to astype(np.int64) takes a floor value of something smaller than the value we want.

Solution:
precision_factor values should be ints. BTW, maybe they could be defined just once.

Example:

import pandas as pd
import influxdb

test_db = 'test_python_client_timestamp_precision'
col_name = 'temperature'
measurement = 'test'

df = pd.Series({pd.Timestamp('2019-04-16 16:24:56.915728Z'): 1}, name = col_name).to_frame()

client = influxdb.DataFrameClient(database=test_db)
client.create_database(test_db)
client.write_points(df, measurement, time_precision='u')

df2 = client.query(f'select * from {measurement}')[measurement]
client.drop_database(test_db)
client.close()
pd.testing.assert_index_equal(df.index, df2.index)

(the assert fails both in current master AND in a version including this PR)

The changes highlight issue brought up by @tpietruszka requiring precision factor to be of type int as apposed to currently float
influxdata#650 (comment)
The changes highlight issue brought up by @tpietruszka requiring precision factor to be of type int as apposed to currently float
influxdata#650 (comment)
The changes highlight issue brought up by @tpietruszka requiring precision factor to be of type int as apposed to currently float
influxdata#650 (comment)
@auphofBSF
Copy link
Author

The issue identified by @tpietruszka #650 (comment) is valid

It happens on certain values, I am not well experienced on numpy arithmetic particular corner cases of conversion to challenge what is correct or incorrect. But @tpietruszka code and the simple experiment of test values and different arithmetic strategies for conversion recorded below demonstrate the issue. I have changed the test values in the Unit tests to

'2013-01-01 23:10:55.123456987+00:00'

This change to unit tests is in the following branch https://github.com/auphofBSF/influxdb-python/tree/bug/microsecond_issue_tpietruszka.
Certainly all tests fail.

The @tpietruszka #650 (comment) result should be identical (There are no NanoSeconds) but where there are nanoseconds what to do ?

A QUESTION and DECISION require about Rounding or Truncating

In the following experiment where there are NanoSecond components of the time should the value be truncated to the MicroSecond or Rounded Up to the Microsecond and for that matter every conversion precision.

I am happy to assist with further alterations, thanks to @tpietruszka for identifying and solving to this stage

image

ts = pd.Timestamp('2013-01-01 23:10:55.123456987+00:00')
ts
ts_ns = np.int64(ts.value)

ts_ns
expected_ts_us = 1357081855123456

#as before Pull Request #650
result1 = np.int64(ts_ns / precision_factor)
(expected_ts_us == result1, result1)

# as in Pull Request #650
result2 = np.int64(ts_ns // precision_factor)
(expected_ts_us == result2, result2)


# Tests for Fix identified by @tpietruszka  DIVIDE
result3=np.int64(ts_ns / np.int64(precision_factor))
(expected_ts_us == result3, result3)

# Tests for Fix identified by @tpietruszka  FLOOR DIVIDE
result4=np.int64(ts_ns // np.int64(precision_factor))
(expected_ts_us == result4, result4)

from six import iteritems, binary_type, text_type, integer_types, PY2

EPOCH = UTC.localize(datetime.utcfromtimestamp(0))
import pandas as pd # Provide for ns timestamps
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add pandas to requirements.txt (and remove pytz and python-dateutil), build is currently failing because of that, see pypy build logs: https://travis-ci.org/influxdata/influxdb-python/jobs/516866320

Great fix btw, looking forward to being merged.

also updated inline docs
detail significance of precision factor to be of type int
as identified by @tpietruszka when precision factor
is a float, result is an approximation.
also define precision factors in 1 place as int's and improved
consistency in usage.
@auphofBSF auphofBSF force-pushed the bug/timestamp_ns_issue_649 branch from 7ac0dbf to f22ddff Compare October 10, 2019 00:14
@auphofBSF
Copy link
Author

The commits now include fixes for @tpietruszka identified issues with precision conversion. Test cases have been updated and pass.

The only failures still exist with pypy. Following @lipi recommendation in #650 (comment), enabled pypy3 to pass but then failed on all standard python implementations clash of tox.ini deps: and requirements.txt duplicate requirements.
Additionally pypy2.7 fails with pandas having a dependency on numpy 1.17 which only supports python >3.5.
I have a seperate local branch https://github.com/auphofBSF/influxdb-python/tree/test/pypy27pandas where I will explore fixing this

@lipi

  1. do you have ANY further suggestions regarding PYPY2.7 and pandas
  2. how best to add pandas to requirements and have CI tests still use there depend in tox.ini

@sebito91 sebito91 removed request for aviau and xginn8 April 8, 2020 20:05
@clslgrnc
Copy link
Contributor

clslgrnc commented Apr 8, 2020

#407 seemed much more lightweight to solve the problem. I am not sure why panda is needed just for time conversion, and I do not really understand why all the powers of 10 are defined as floats with the notation 1eX (which result in a float in python) then converted to numpy 64 bits ints. Why not just use 10 ** X which is already a python integer with arbitrary precision?

This was removed in an upstream PR but reintroduced incorrectly here.
@sebito91
Copy link
Contributor

sebito91 commented Apr 8, 2020

If this still fails pypy inclusion I will re-address #407.

@sebito91
Copy link
Contributor

sebito91 commented Apr 8, 2020

Duplicate of #407 without pandas requirement

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants